By- Suraj Honkamble
The dataset was released by Aspiring Minds from the Aspiring Mind Employment Outcome 2015 (AMEO). The study is primarily limited only to students with engineering disciplines. The dataset contains the employment outcomes of engineering graduates as dependent variables (Salary, Job Titles, and Job Locations) along with the standardized scores from three different areas – cognitive skills, technical skills and personality skills. The dataset also contains demographic features. The dataset contains around 40 independent variables and 4000 data points. The independent variables are both continuous and categorical in nature. The dataset contains a unique identifier for each candidate.
“After doing your Computer Science Engineering if you take up jobs as a Programming Analyst, Software Engineer, Hardware Engineer and Associate Engineer you can earn up to 2.5-3 lakhs as a fresh
graduate.” Test this claim with the data given to you.import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
sns.set_style('darkgrid')
import plotly.express as px
import plotly.graph_objects as go
import warnings
warnings.filterwarnings('ignore')
pd.options.display.max_columns=50
df=pd.read_excel('D:\DATA SCIENCE Internship with Innomatics\Data Analysis\\aspiring_minds_employability_outcomes_2015.xlsx')
df.head()
| Unnamed: 0 | ID | Salary | DOJ | DOL | Designation | JobCity | Gender | DOB | 10percentage | 10board | 12graduation | 12percentage | 12board | CollegeID | CollegeTier | Degree | Specialization | collegeGPA | CollegeCityID | CollegeCityTier | CollegeState | GraduationYear | English | Logical | Quant | Domain | ComputerProgramming | ElectronicsAndSemicon | ComputerScience | MechanicalEngg | ElectricalEngg | TelecomEngg | CivilEngg | conscientiousness | agreeableness | extraversion | nueroticism | openess_to_experience | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | train | 203097 | 420000 | 2012-06-01 | present | senior quality engineer | Bangalore | f | 1990-02-19 | 84.3 | board ofsecondary education,ap | 2007 | 95.8 | board of intermediate education,ap | 1141 | 2 | B.Tech/B.E. | computer engineering | 78.00 | 1141 | 0 | Andhra Pradesh | 2011 | 515 | 585 | 525 | 0.635979 | 445 | -1 | -1 | -1 | -1 | -1 | -1 | 0.9737 | 0.8128 | 0.5269 | 1.35490 | -0.4455 |
| 1 | train | 579905 | 500000 | 2013-09-01 | present | assistant manager | Indore | m | 1989-10-04 | 85.4 | cbse | 2007 | 85.0 | cbse | 5807 | 2 | B.Tech/B.E. | electronics and communication engineering | 70.06 | 5807 | 0 | Madhya Pradesh | 2012 | 695 | 610 | 780 | 0.960603 | -1 | 466 | -1 | -1 | -1 | -1 | -1 | -0.7335 | 0.3789 | 1.2396 | -0.10760 | 0.8637 |
| 2 | train | 810601 | 325000 | 2014-06-01 | present | systems engineer | Chennai | f | 1992-08-03 | 85.0 | cbse | 2010 | 68.2 | cbse | 64 | 2 | B.Tech/B.E. | information technology | 70.00 | 64 | 0 | Uttar Pradesh | 2014 | 615 | 545 | 370 | 0.450877 | 395 | -1 | -1 | -1 | -1 | -1 | -1 | 0.2718 | 1.7109 | 0.1637 | -0.86820 | 0.6721 |
| 3 | train | 267447 | 1100000 | 2011-07-01 | present | senior software engineer | Gurgaon | m | 1989-12-05 | 85.6 | cbse | 2007 | 83.6 | cbse | 6920 | 1 | B.Tech/B.E. | computer engineering | 74.64 | 6920 | 1 | Delhi | 2011 | 635 | 585 | 625 | 0.974396 | 615 | -1 | -1 | -1 | -1 | -1 | -1 | 0.0464 | 0.3448 | -0.3440 | -0.40780 | -0.9194 |
| 4 | train | 343523 | 200000 | 2014-03-01 | 2015-03-01 00:00:00 | get | Manesar | m | 1991-02-27 | 78.0 | cbse | 2008 | 76.8 | cbse | 11368 | 2 | B.Tech/B.E. | electronics and communication engineering | 73.90 | 11368 | 0 | Uttar Pradesh | 2012 | 545 | 625 | 465 | 0.124502 | -1 | 233 | -1 | -1 | -1 | -1 | -1 | -0.8810 | -0.2793 | -1.0697 | 0.09163 | -0.1295 |
df.shape
(3998, 39)
df.info()
<class 'pandas.core.frame.DataFrame'> RangeIndex: 3998 entries, 0 to 3997 Data columns (total 39 columns): # Column Non-Null Count Dtype --- ------ -------------- ----- 0 Unnamed: 0 3998 non-null object 1 ID 3998 non-null int64 2 Salary 3998 non-null int64 3 DOJ 3998 non-null datetime64[ns] 4 DOL 3998 non-null object 5 Designation 3998 non-null object 6 JobCity 3998 non-null object 7 Gender 3998 non-null object 8 DOB 3998 non-null datetime64[ns] 9 10percentage 3998 non-null float64 10 10board 3998 non-null object 11 12graduation 3998 non-null int64 12 12percentage 3998 non-null float64 13 12board 3998 non-null object 14 CollegeID 3998 non-null int64 15 CollegeTier 3998 non-null int64 16 Degree 3998 non-null object 17 Specialization 3998 non-null object 18 collegeGPA 3998 non-null float64 19 CollegeCityID 3998 non-null int64 20 CollegeCityTier 3998 non-null int64 21 CollegeState 3998 non-null object 22 GraduationYear 3998 non-null int64 23 English 3998 non-null int64 24 Logical 3998 non-null int64 25 Quant 3998 non-null int64 26 Domain 3998 non-null float64 27 ComputerProgramming 3998 non-null int64 28 ElectronicsAndSemicon 3998 non-null int64 29 ComputerScience 3998 non-null int64 30 MechanicalEngg 3998 non-null int64 31 ElectricalEngg 3998 non-null int64 32 TelecomEngg 3998 non-null int64 33 CivilEngg 3998 non-null int64 34 conscientiousness 3998 non-null float64 35 agreeableness 3998 non-null float64 36 extraversion 3998 non-null float64 37 nueroticism 3998 non-null float64 38 openess_to_experience 3998 non-null float64 dtypes: datetime64[ns](2), float64(9), int64(18), object(10) memory usage: 1.2+ MB
df.isna().sum()
Unnamed: 0 0 ID 0 Salary 0 DOJ 0 DOL 0 Designation 0 JobCity 0 Gender 0 DOB 0 10percentage 0 10board 0 12graduation 0 12percentage 0 12board 0 CollegeID 0 CollegeTier 0 Degree 0 Specialization 0 collegeGPA 0 CollegeCityID 0 CollegeCityTier 0 CollegeState 0 GraduationYear 0 English 0 Logical 0 Quant 0 Domain 0 ComputerProgramming 0 ElectronicsAndSemicon 0 ComputerScience 0 MechanicalEngg 0 ElectricalEngg 0 TelecomEngg 0 CivilEngg 0 conscientiousness 0 agreeableness 0 extraversion 0 nueroticism 0 openess_to_experience 0 dtype: int64
No Missing values.
df['Unnamed: 0'].unique()
array(['train'], dtype=object)
df.drop('Unnamed: 0', axis=1, inplace=True)
df['ID'].nunique()
3998
df['Salary'].dtype
dtype('int64')
print("Minimum Salary", df['Salary'].min())
print("Maximum Salary", df['Salary'].max())
print("Average Salary", df['Salary'].mean())
print("Median Salary", df['Salary'].median())
Minimum Salary 35000 Maximum Salary 4000000 Average Salary 307699.8499249625 Median Salary 300000.0
Mean and Median for Salary column is nearly equal, so might be less posibility that this columns contains any outlier.
plt.figure(figsize=(12,4))
sns.boxplot(df['Salary'], color='pink')
plt.title('Distribution of Salary befor Log Tranformation', fontsize=18, color='blue');
plt.figure(figsize=(12,4))
sns.distplot(df['Salary'], color='orange')
plt.title('Distribution of Salary befor Log Tranformation', fontsize=18, color='blue');
plt.figure(figsize=(12,4))
sns.distplot(np.log(df['Salary']), color='orange')
plt.title('Distribution of Salary after Log Tranformation', fontsize=18, color='blue');
plt.figure(figsize=(12,4))
sns.boxplot(np.log(df['Salary']), color='pink')
plt.title('Distribution of Salary after Log Tranformation', fontsize=18, color='blue');
Now the Distribution somewhat looks similar to normal, and we are get rid of outliers which are laying far away from the group of data.
print(df['DOJ'].dtype)
datetime64[ns]
print("First Recruitment Date:", df['DOJ'].min())
print("Last Recruitment Date:", df['DOJ'].max())
First Recruitment Date: 1991-06-01 00:00:00 Last Recruitment Date: 2015-12-01 00:00:00
First Employee joined the Company on 1st June 1991 and last employee on 1st December 2015.
df['Year_of_Joining']=df['DOJ'].dt.year
plt.figure(figsize=(12,4))
sns.countplot(x='Year_of_Joining', data=df);
plt.title("Recruitment by Year", fontsize=18, color='blue');
df['DOL'].dtype
dtype('O')
df['DOL'].head()
0 present 1 present 2 present 3 present 4 2015-03-01 00:00:00 Name: DOL, dtype: object
"01-01-2016". Then convert it to datetime column.¶present_date='2016-01-01'
df['DOL'].replace({"present": present_date}, inplace=True)
df['DOL'].head()
0 2016-01-01 1 2016-01-01 2 2016-01-01 3 2016-01-01 4 2015-03-01 00:00:00 Name: DOL, dtype: object
df['DOL']=pd.to_datetime(df['DOL'])
print(df['DOL'].dtype)
datetime64[ns]
df[df['DOL']=='2016-01-01'].shape[0]
1875
1875 Employees still working in this company.
df['Year_of_Leaving']=df['DOL'].dt.year
year_of_leaving=df[df['Year_of_Leaving']!=2016]
plt.figure(figsize=(12,4))
sns.countplot(x='Year_of_Leaving', data=year_of_leaving);
plt.title("Count of Employees left by each year", fontsize=18, color='blue');
Every calender year the count of employees who left the company is increasing.
df['Designation'].dtype
dtype('O')
df['Designation'].nunique()
419
df['Designation'].unique()
array(['senior quality engineer', 'assistant manager', 'systems engineer',
'senior software engineer', 'get', 'system engineer',
'java software engineer', 'mechanical engineer',
'electrical engineer', 'project engineer', 'senior php developer',
'senior systems engineer', 'quality assurance engineer',
'qa analyst', 'network engineer', 'product development engineer',
'associate software developer', 'data entry operator',
'software engineer', 'developer', 'electrical project engineer',
'programmer analyst', 'systems analyst', 'ase',
'telecommunication engineer', 'application developer',
'ios developer', 'executive assistant', 'online marketing manager',
'documentation specialist', 'associate software engineer',
'management trainee', 'site manager', 'software developer',
'.net developer', 'production engineer', 'jr. software engineer',
'trainee software developer', 'ui developer',
'assistant system engineer', 'android developer',
'customer service', 'test engineer', 'java developer', 'engineer',
'recruitment coordinator', 'technical support engineer',
'data analyst', 'assistant software engineer', 'faculty',
'entry level management trainee',
'customer service representative', 'software test engineer',
'firmware engineer', 'php developer', 'research associate',
'research analyst', 'quality engineer', 'programmer',
'technical support executive', 'business analyst', 'web developer',
'application engineer', 'project coordinator', 'engineer trainee',
'sap consultant', 'quality analyst', 'marketing coordinator',
'system administrator', 'senior engineer',
'business development managerde', 'network administrator',
'technical support specialist', 'business development executive',
'junior software engineer', 'asp.net developer',
'graduate engineer trainee', 'field engineer',
'assistant professor', 'trainee software engineer',
'senior software developer',
'quality assurance automation engineer', 'design engineer',
'telecom engineer', 'quality control engineer',
'hardware engineer', 'hr recruiter', 'sales associate',
'junior engineer', 'associate engineer', 'maintenance engineer',
'sales engineer', 'human resources associate',
'mobile application developer',
'electronic field service engineer', 'process associate',
'field service engineer', 'it support specialist',
'software development engineer', 'business process analyst',
'operation engineer', 'electrical designer', 'marketing assistant',
'sales executive', 'admin assistant', 'senior java developer',
'account executive', 'oracle dba', 'rf engineer',
'embedded software engineer', 'programmer analyst trainee',
'technical engineer', 'operations executive', 'trainee engineer',
'recruiter', 'lecturer', '.net web developer',
'marketing executive', 'operations assistant', 'associate manager',
'electrical design engineer', 'systems administrator',
'client services associate', 'it analyst', 'senior developer',
'cad designer', 'business technology analyst', 'asst. manager',
'service engineer', 'executive recruiter', 'planning engineer',
'associate technical operations', 'web designer',
'software architect', 'software quality assurance tester',
'seo trainee', 'process engineer',
'software quality assurance analyst', 'designer',
'business systems consultant', 'business development manager',
'junior research fellow', 'technical recruiter',
'operations analyst', 'quality assurance test engineer',
'linux systems administrator', 'software trainee',
'entry level sales and marketing', 'electrical field engineer',
'windows systems administrator', 'junior software developer',
'python developer', 'web application developer',
'assistant systems engineer', 'javascript developer',
'operation executive', 'performance engineer', 'technical writer',
'operations engineer and jetty handling', 'lead engineer',
'portfolio analyst', 'associate system engineer',
'mechanical design engineer', 'product engineer',
'network security engineer', 'operations manager',
'technical lead', 'operations', 'quality assurance tester',
'automation engineer', 'data scientist', 'quality associate',
'manual tester', 'sr. engineer', 'embedded engineer',
'service and sales engineer', 'telecom support engineer',
'engineer- customer support', 'cloud engineer', 'branch manager',
'business analyst consultant', 'technology lead',
'software trainee engineer', 'dcs engineer', 'junior manager',
'ux designer', 'clerical', 'hr generalist',
'database administrator', 'senior design engineer', 'seo',
'assistant engineer', 'marketing analyst', 'it executive',
'salesforce developer', 'software tester', 'sql dba',
'junior engineer product support', 'manager',
'senior business analyst', 'c# developer',
'implementation engineer', 'executive hr', 'executive engineer',
'sharepoint developer', 'system analyst',
'sales management trainee', 'senior project engineer',
'it recruiter', 'software engineer analyst',
'desktop support technician', 'continuous improvement engineer',
'process advisor', 'etl developer', 'sales and service engineer',
'project manager', 'training specialist', 'product manager',
'staffing recruiter', 'assistant programmer', 'quality controller',
'mis executive', 'game developer', 'digital marketing specialist',
'principal software engineer', 'software devloper',
'senior mechanical engineer', 'technical operations analyst',
'service coordinator', 'testing engineer', 'technical assistant',
'sap abap consultant', 'seo engineer', 'project assistant',
'talent acquisition specialist', 'sales account manager',
'software engineer trainee', 'customer service manager',
'help desk analyst', 'general manager', 'engineering manager',
'senior network engineer',
'field based employee relations manager', 'phone banking officer',
'support engineer', 'associate test engineer',
'technology analyst', 'network support engineer',
'it business analyst', 'junior system analyst',
'senior .net developer', 'secretary', 'research engineer',
'quality assurance auditor', 'process executive',
'lecturer & electrical maintenance', 'office coordinator',
'hr manager', 'html developer', 'sales support',
'front end web developer', 'administrative support',
'territory sales manager', 'project administrator',
'environmental engineer', 'web designer and seo',
'information security analyst',
'field business development associate', 'operational executive',
'administrative coordinator', 'senior risk consultant',
'desktop support engineer', 'cad drafter', 'noc engineer',
'industrial engineer', 'it engineer', 'human resources intern',
'senior quality assurance engineer', 'clerical assistant',
'software enginner', 'quality assurance',
'delivery software engineer', 'graphic designer',
'sales development manager', 'visiting faculty',
'business intelligence analyst', 'team lead',
'operational excellence manager', 'sales & service engineer',
'web intern', 'full stack developer', 'database developer',
'sr. database engineer', 'graduate apprentice trainee',
'software engineer associate', 'technical analyst',
'executive engg', 'it technician', 'business system analyst',
'process control engineer', 'technical consultant',
'business office manager', 'quality control inspector',
'product design engineer', 'manufacturing engineer',
'seo executive', 'sap analyst', 'software engineere',
'financial service consultant', 'co faculty', 'software analyst',
'desktop support analyst', 'graduate engineer',
'engineering technician', 'it assistant', 'marketing manager',
'human resource assistant', 'hr assistant', 'product developer',
'customer support engineer',
'quality control inspection technician', 'gis/cad engineer',
'senior web developer', 'sql developer', 'research staff member',
'sap abap associate consultant', 'associate qa',
'corporate recruiter', 'project management officer',
'business systems analyst', 'software programmer',
'help desk technician', 'sales manager', 'catalog associate',
'assistant store manager', 'software engg', 'it developer',
'apprentice', 'business consultant', 'controls engineer',
'ruby on rails developer', 'risk consultant', 'account manager',
'professor', 'assistant administrator', 'civil engineer',
'educator', 'service manager', 'teradata dba',
'full-time loss prevention associate', 'junior recruiter',
'associate developer', 'assistant electrical engineer',
'shift engineer', 'dotnet developer', 'rf/dt engineer',
'human resources analyst', 'software test engineerte',
'junior .net developer', 'java trainee', 'maintenance supervisor',
'r&d engineer', 'front end developer', 'engineer-hws',
'operations engineer', 'senior research fellow',
'web designer and joomla administrator',
'enterprise solutions developer',
'information technology specialist', 'site engineer',
'graduate trainee engineer', 'quality assurance analyst',
'cnc programmer', 'financial analyst', 'system engineer trainee',
'sap mm consultant', 'assistant system engineer trainee',
'qa trainee', 'teradata developer', 'hr executive',
'senior programmer', 'software test engineer (etl)',
'associate software engg', 'supply chain analyst', 'sales trainer',
'software executive', 'team leader',
'assistant system engineer - trainee', 'seo analyst',
'risk investigator', 'executive administrative assistant',
'program manager', 'r & d', 'sap functional consultant',
'website developer/tester', 'software designer',
'sales coordinator', 'qa engineer', 'aircraft technician',
'customer care executive', 'senior test engineer',
'program analyst trainee', 'electrical controls engineer',
'trainee decision scientist', 'editor', 'bss engineer', 'dba',
'software eng', 'computer faculty', 'recruitment associate',
'logistics executive', 'quality consultant',
'senior sales executive', 'db2 dba', 'test technician',
'it operations associate', 'software engineering associate',
'research scientist', 'jr. software developer'], dtype=object)
df['Designation'].replace({"ase":"associate software engineer", "get":"graduate engineer trainee",
"graduate trainee engineer":"graduate engineer trainee", "software engineer associate":"associate software engineer",
"business development managerde":"business development manager",
"jr. software developer":"junior software developer", "software trainee engineer":"software engineer trainee",
"asst. manager":"assistant manager", "dotnet developer":".net developer",
"qa analyst":"quality assurance analyst","associate software engg":"associate software engineer",
"assistant system engineer - trainee":"assistant system engineer trainee",
"software eng":"software engineer", "software engineering associate":"software engineer associate",
"software enginner":"software engineer", "systems engineer":"system engineer"}, inplace=True)
df['Designation'].nunique()
404
df['Designation']=df['Designation'].str.title()
df['Designation']
0 Senior Quality Engineer
1 Assistant Manager
2 System Engineer
3 Senior Software Engineer
4 Graduate Engineer Trainee
...
3993 Software Engineer
3994 Technical Writer
3995 Associate Software Engineer
3996 Software Developer
3997 Senior Systems Engineer
Name: Designation, Length: 3998, dtype: object
job_role=df['Designation'].value_counts()[:20].reset_index()
plt.figure(figsize=(12,4))
sns.barplot(x='index', y='Designation', data=job_role)
plt.xticks(rotation=60, fontsize=12)
plt.ylabel("Count of Designation")
plt.xlabel("Designation")
plt.title("Count of Enployees working in specific Designation", fontsize=18, color='blue');
405 unique designations.3998 employees, 500+ employees working as Software Engineer. Related to Software Development field.
print("Total Business analysts: ", df[df['Designation']=='Business Analyst'].shape[0])
print("Total Data analysts: ", df[df['Designation']=='Data Analyst'].shape[0])
print("Total Python Developers: ", df[df['Designation']=='Python Developer'].shape[0])
print("Total Data Scientists: ", df[df['Designation']=='Data Scientist'].shape[0])
Total Business analysts: 49 Total Data analysts: 49 Total Python Developers: 1 Total Data Scientists: 3
df['JobCity'].dtype
dtype('O')
df['JobCity'].nunique()
339
df['JobCity'].unique()
array(['Bangalore', 'Indore', 'Chennai', 'Gurgaon', 'Manesar',
'Hyderabad', 'Banglore', 'Noida', 'Kolkata', 'Pune', -1, 'mohali',
'Jhansi', 'Delhi', 'Hyderabad ', 'Bangalore ', 'noida', 'delhi',
'Bhubaneswar', 'Navi Mumbai', 'Mumbai', 'New Delhi', 'Mangalore',
'Rewari', 'Gaziabaad', 'Bhiwadi', 'Mysore', 'Rajkot',
'Greater Noida', 'Jaipur', 'noida ', 'HYDERABAD', 'mysore',
'THANE', 'Maharajganj', 'Thiruvananthapuram', 'Punchkula',
'Bhubaneshwar', 'Pune ', 'coimbatore', 'Dhanbad', 'Lucknow',
'Trivandrum', 'kolkata', 'mumbai', 'Gandhi Nagar', 'Una',
'Daman and Diu', 'chennai', 'GURGOAN', 'vsakhapttnam', 'pune',
'Nagpur', 'Bhagalpur', 'new delhi - jaisalmer', 'Coimbatore',
'Ahmedabad', 'Kochi/Cochin', 'Bankura', 'Bengaluru', 'Mysore ',
'Kanpur ', 'jaipur', 'Gurgaon ', 'bangalore', 'CHENNAI',
'Vijayawada', 'Kochi', 'Beawar', 'Alwar', 'NOIDA', 'Greater noida',
'Siliguri ', 'raipur', 'gurgaon', 'Bhopal', 'Faridabad', 'Jodhpur',
'udaipur', 'Muzaffarpur', 'Kolkata`', 'Bulandshahar', 'Haridwar',
'Raigarh', 'Visakhapatnam', 'Jabalpur', 'hyderabad', 'Unnao',
'KOLKATA', 'Thane', 'Aurangabad', 'Belgaum', 'gurgoan', 'Dehradun',
'Rudrapur', 'Jamshedpur', 'vizag', 'Nouda', 'Dharamshala',
'Banagalore', 'Hissar', 'Ranchi', 'BANGALORE', 'Madurai', 'Gurga',
'Chandigarh', 'Australia', ' Chennai', 'CHEYYAR', 'Mumbai ',
'sonepat', 'Ghaziabad', 'Pantnagar', 'Siliguri', 'mumbai ',
'Jagdalpur', 'Chennai ', 'angul', 'Baroda', ' ariyalur', 'Jowai',
'Kochi/Cochin, Chennai and Coimbatore', 'bhubaneswar', 'Neemrana',
'VIZAG', 'Tirupathi', 'Lucknow ', 'Ahmedabad ', 'Bhubneshwar',
'Noida ', 'pune ', 'Calicut', 'Gandhinagar', 'LUCKNOW', 'Dubai',
'bengaluru', 'MUMBAI', 'Ahmednagar', 'Nashik', 'New delhi',
'Bellary', 'Ludhiana', 'New Delhi ', 'Muzaffarnagar', 'BHOPAL',
'Gurgoan', 'Gagret', 'Indirapuram, Ghaziabad', 'Gwalior',
'new delhi', 'TRIVANDRUM', 'Chennai & Mumbai', 'Rajasthan',
'Sonipat', 'Bareli', 'Kanpur', 'Hospete', 'Miryalaguda', ' mumbai',
'Dharuhera', 'lucknow', 'meerut', 'dehradun', 'Ganjam', 'Hubli',
'bangalore ', 'NAVI MUMBAI', 'ncr', 'Agra', 'Trichy',
'kudankulam ,tarapur', 'Ongole', 'Sambalpur', 'Pondicherry',
'Bundi', 'SADULPUR,RAJGARH,DISTT-CHURU,RAJASTHAN', 'AM', 'Bikaner',
'Vadodara', 'BAngalore', 'india', 'Asansol', 'Tirunelvelli',
'Ernakulam', 'DELHI', 'Bilaspur', 'Chandrapur', 'Nanded',
'Dharmapuri', 'Vandavasi', 'Rohtak', 'trivandrum', 'Nagpur ',
'Udaipur', 'Patna', 'banglore', 'indore', 'Salem', 'Nasikcity',
'Gandhinagar ', 'Technopark, Trivandrum', 'Bharuch', 'Tornagallu',
'Raipur', 'Kolkata ', 'Jaspur', 'Burdwan', 'Bhubaneswar ',
'Shimla', 'ahmedabad', 'Gajiabaad', 'Jammu', 'Shahdol',
'Muvattupuzha', 'Al Jubail,Saudi Arabia', 'Kalmar, Sweden',
'Secunderabad', 'A-64,sec-64,noida', 'Ratnagiri', 'Jhajjar',
'Gulbarga', 'hyderabad(bhadurpally)', 'Nalagarh', 'Chandigarh ',
'Jaipur ', 'Jeddah Saudi Arabia', ' Delhi', 'PATNA', 'SHAHDOL',
'Chennai, Bangalore', 'Bhopal ', 'Jamnagar', 'PUNE', 'Tirupati',
'Gonda', 'jamnagar', 'chennai ', 'orissa', 'kharagpur',
'Trivandrum ', 'Navi Mumbai , Hyderabad', 'Joshimath',
'chandigarh', 'Bathinda', 'Johannesburg', 'kala amb ', 'Karnal',
'LONDON', 'Kota', 'Panchkula', 'Baddi HP', 'Nagari',
'Mettur, Tamil Nadu ', 'Durgapur', 'pondi', 'Surat', 'Kurnool',
'kolhapur', 'Madurai ', 'GREATER NOIDA', 'Bhilai', ' Pune',
'hderabad', 'KOTA', 'thane', 'Vizag', 'Bahadurgarh',
'Rayagada, Odisha', 'kakinada', 'GURGAON', 'Varanasi', 'punr',
'Nellore', 'patna', 'Meerut', 'hyderabad ', 'Sahibabad', 'Howrah',
'BHUBANESWAR', 'Trichur', 'Ambala', 'Khopoli', 'keral', 'Roorkee',
'Greater NOIDA', 'Navi mumbai', 'ghaziabad', 'Allahabad',
'Delhi/NCR', 'Panchkula ', 'Ranchi ', 'Jalandhar', 'manesar',
'vapi', 'PILANI', 'muzzafarpur', 'RAS AL KHAIMAH', 'bihar',
'singaruli', 'KANPUR', 'Banglore ', 'pondy', 'Mohali', 'Phagwara',
' Mumbai', ' bangalore', 'GURAGAON', 'Baripada', 'MEERUT',
'Yamuna Nagar', 'shahibabad', 'sampla', 'Guwahati', 'Rourkela',
'Banaglore', 'Vellore', 'Dausa', 'latur (Maharashtra )',
'NEW DELHI', 'kanpur', 'Mainpuri', 'karnal', 'Dammam', 'Haldia',
'sambalpur', 'RAE BARELI', 'ranchi', 'jAipur', 'BANGLORE',
'Patiala', 'Gorakhpur', 'new dehli', 'BANGALORE ', 'Ambala City',
'Karad', 'Rajpura', 'Pilani', 'haryana', 'Asifabadbanglore'],
dtype=object)
Many Job Cities are repeated with some spelling change, some white spaces at beginning or at end, Some are in upper case and some are lower case, some are with old city names(the city name changed now.). Lets apply string methods to clean the names.
df['JobCity']=df['JobCity'].str.replace(" ","").str.lower()
df['JobCity'].nunique()
230
df['JobCity'].unique()
array(['bangalore', 'indore', 'chennai', 'gurgaon', 'manesar',
'hyderabad', 'banglore', 'noida', 'kolkata', 'pune', nan, 'mohali',
'jhansi', 'delhi', 'bhubaneswar', 'navimumbai', 'mumbai',
'newdelhi', 'mangalore', 'rewari', 'gaziabaad', 'bhiwadi',
'mysore', 'rajkot', 'greaternoida', 'jaipur', 'thane',
'maharajganj', 'thiruvananthapuram', 'punchkula', 'bhubaneshwar',
'coimbatore', 'dhanbad', 'lucknow', 'trivandrum', 'gandhinagar',
'una', 'damananddiu', 'gurgoan', 'vsakhapttnam', 'nagpur',
'bhagalpur', 'newdelhi-jaisalmer', 'ahmedabad', 'kochi/cochin',
'bankura', 'bengaluru', 'kanpur', 'vijayawada', 'kochi', 'beawar',
'alwar', 'siliguri', 'raipur', 'bhopal', 'faridabad', 'jodhpur',
'udaipur', 'muzaffarpur', 'kolkata`', 'bulandshahar', 'haridwar',
'raigarh', 'visakhapatnam', 'jabalpur', 'unnao', 'aurangabad',
'belgaum', 'dehradun', 'rudrapur', 'jamshedpur', 'vizag', 'nouda',
'dharamshala', 'banagalore', 'hissar', 'ranchi', 'madurai',
'gurga', 'chandigarh', 'australia', 'cheyyar', 'sonepat',
'ghaziabad', 'pantnagar', 'jagdalpur', 'angul', 'baroda',
'ariyalur', 'jowai', 'kochi/cochin,chennaiandcoimbatore',
'neemrana', 'tirupathi', 'bhubneshwar', 'calicut', 'dubai',
'ahmednagar', 'nashik', 'bellary', 'ludhiana', 'muzaffarnagar',
'gagret', 'indirapuram,ghaziabad', 'gwalior', 'chennai&mumbai',
'rajasthan', 'sonipat', 'bareli', 'hospete', 'miryalaguda',
'dharuhera', 'meerut', 'ganjam', 'hubli', 'ncr', 'agra', 'trichy',
'kudankulam,tarapur', 'ongole', 'sambalpur', 'pondicherry',
'bundi', 'sadulpur,rajgarh,distt-churu,rajasthan', 'am', 'bikaner',
'vadodara', 'india', 'asansol', 'tirunelvelli', 'ernakulam',
'bilaspur', 'chandrapur', 'nanded', 'dharmapuri', 'vandavasi',
'rohtak', 'patna', 'salem', 'nasikcity', 'technopark,trivandrum',
'bharuch', 'tornagallu', 'jaspur', 'burdwan', 'shimla',
'gajiabaad', 'jammu', 'shahdol', 'muvattupuzha',
'aljubail,saudiarabia', 'kalmar,sweden', 'secunderabad',
'a-64,sec-64,noida', 'ratnagiri', 'jhajjar', 'gulbarga',
'hyderabad(bhadurpally)', 'nalagarh', 'jeddahsaudiarabia',
'chennai,bangalore', 'jamnagar', 'tirupati', 'gonda', 'orissa',
'kharagpur', 'navimumbai,hyderabad', 'joshimath', 'bathinda',
'johannesburg', 'kalaamb', 'karnal', 'london', 'kota', 'panchkula',
'baddihp', 'nagari', 'mettur,tamilnadu', 'durgapur', 'pondi',
'surat', 'kurnool', 'kolhapur', 'bhilai', 'hderabad',
'bahadurgarh', 'rayagada,odisha', 'kakinada', 'varanasi', 'punr',
'nellore', 'sahibabad', 'howrah', 'trichur', 'ambala', 'khopoli',
'keral', 'roorkee', 'allahabad', 'delhi/ncr', 'jalandhar', 'vapi',
'pilani', 'muzzafarpur', 'rasalkhaimah', 'bihar', 'singaruli',
'pondy', 'phagwara', 'guragaon', 'baripada', 'yamunanagar',
'shahibabad', 'sampla', 'guwahati', 'rourkela', 'banaglore',
'vellore', 'dausa', 'latur(maharashtra)', 'mainpuri', 'dammam',
'haldia', 'raebareli', 'patiala', 'gorakhpur', 'newdehli',
'ambalacity', 'karad', 'rajpura', 'haryana', 'asifabadbanglore'],
dtype=object)
df['JobCity'].replace({"banglore":"bangalore","bengaluru":"bangalore","banagalore":"bangalore","banaglore":"bangalore","guragaon":"gurgaon",
"asifabadbanglore":"bangalore","chennai&mumbai":"chennai","greaternoida":"noida","bhubaneswar":"bhubaneshwar",
"chennai,bangalore":"chennai", "a-64,sec-64,noida":"noida", "nouda":"noida","vsakhapttnam":"visakhapatnam",
"kochi/cochin,chennaiandcoimbatore":"kochi","kochi/cochin":"kochi", "ncr":"delhi","trivandrum":"thiruvananthapuram",
"technopark,trivandrum":"thiruvananthapuram", "hderabad":"hyderabad","hyderabad(bhadurpally)":"hyderabad",
"ambalacity":"ambala", "newdehli":"newdelhi","punr":"pune","jeddahsaudiarabia":"jeddah-saudiarabia","dammam":"damman",
"aljubail,saudiarabia":"aljubail-saudiarabia","kalmar,sweden":"kalmar-sweden","kudankulam,tarapur":"kudankulam",
"latur(maharashtra)":"latur","delhi/ncr":"delhi","mettur,tamilnadu":"mettur-tamilnadu","nasikcity":"nashik",
"rayagada,odisha":"rayagada-odisha","damananddiu":"damman","indirapuram,ghaziabad":"ghaziabad","baroda":"vadodara",
"gajiabaad":"ghaziabad","gaziabaad":"ghaziabad","navimumbai,hyderabad":"navimumbai","bhubneshwar":"bhubaneshwar",
"sadulpur,rajgarh,distt-churu,rajasthan":"rajasthan","tirupathi":"tirupati","newdelhi-jaisalmer":"newdelhi"}, inplace=True)
df['JobCity'].nunique()
195
After cleaning the city names, we left with 195 Job Cities.
df['JobCity']=df['JobCity'].str.title()
df['JobCity']
0 Bangalore
1 Indore
2 Chennai
3 Gurgaon
4 Manesar
...
3993 Newdelhi
3994 Hyderabad
3995 Bangalore
3996 Bangalore
3997 Chennai
Name: JobCity, Length: 3998, dtype: object
job_role=df['JobCity'].value_counts()[:20].reset_index()
plt.figure(figsize=(12,4))
sns.barplot(x='index', y='JobCity', data=job_role)
plt.xticks(rotation=60, fontsize=12)
plt.ylabel("Count of Job City")
plt.xlabel("Job City")
plt.title("Count of Enployees working in different City", fontsize=18, color='blue');
20% of employees working in Bangalore.%
df['Gender'].dtype
dtype('O')
df['Gender'].unique()
array(['f', 'm'], dtype=object)
plt.figure(figsize=(12,4))
sns.countplot(df['Gender'], palette='pink_r')
plt.xticks(fontsize=15)
plt.title("Count of Enployees by Gender", fontsize=18, color='blue');
df['DOB'].dtype
dtype('<M8[ns]')
To Answer this question we need present date and as per this dataset the present date can be assumed as 01-01-2016. Just Substract Date of Borth from Present Date i.e 01-01-2016. After Substracing We get TImeDelta result. By using datetime compoment we get the days from date of birth to now, by dividing this by 365 we get the Age of EMployee.
import datetime as dt
present='2016-01-01'
df['age_days']=(dt.datetime(2016, 1, 1) - df['DOB'])
df['age_days'].head(2)
0 9447 days 1 9585 days Name: age_days, dtype: timedelta64[ns]
df['Age']=((df['age_days'].dt.components.iloc[:, 0])/365).astype(int)
df.drop('age_days', axis=1, inplace=True)
df['Age'].head(2)
0 25 1 26 Name: Age, dtype: int32
print("Yongest Employee DOB :", df['DOB'].min())
print("Oldest Employee DOB :", df['DOB'].max())
Yongest Employee DOB : 1977-10-30 00:00:00 Oldest Employee DOB : 1997-05-27 00:00:00
print("Yongest Employee Age :", df['Age'].min())
print("Oldest Employee Age :", df['Age'].max())
print("Average Age of Employees:", df['Age'].mean())
Yongest Employee Age : 18 Oldest Employee Age : 38 Average Age of Employees: 24.595547773886942
30-10-1977, Youngest EMployees Date of Birth is 27-05-1997.38 in the 2016 and Youngest Employees age is 18 in 2016.
df['Age'].dtype
dtype('int32')
plt.figure(figsize=(12,4))
sns.countplot(df['Age'])
plt.xticks(fontsize=12)
plt.title("Count of Enployees by their Age", fontsize=18, color='blue');
22 to 27.70% of total employees are of age 23,24 & 25.
df['10percentage'].dtype
dtype('float64')
print("Minumum 10th Class Percentage :", df['10percentage'].min())
print("Maximum 10th Class Percentage :", df['10percentage'].max())
print("Average 10th Class Percentage: ", df['10percentage'].mean())
Minumum 10th Class Percentage : 43.0 Maximum 10th Class Percentage : 97.76 Average 10th Class Percentage: 77.9254427213606
plt.figure(figsize=(12,4))
sns.boxplot(df['10percentage'], color='pink')
plt.xticks(fontsize=12)
plt.title("Distribution of 10th Class Percentage of Employees", fontsize=18, color='blue');
plt.figure(figsize=(12,4))
sns.histplot(df['10percentage'], color='orange', kde=True)
plt.xticks(fontsize=12)
plt.title("Distribution of 10th Class Percentage of Employees", fontsize=18, color='blue');
Minumum 10th Class Percentage is 43%, Maximum 10th Class Percentage is 97.76% & Average 10th Class Percentage is 77.93%.10th Percetage between 70% to 90%.10th Percentage is less than 50%.
df['10board'].dtype
dtype('O')
df['10board'].nunique()
275
df['10board'].unique()
array(['board ofsecondary education,ap', 'cbse', 'state board',
'mp board bhopal', 'icse',
'karnataka secondary school of examination', 'up',
'karnataka state education examination board', 'ssc',
'kerala state technical education', 0, 'bseb',
'state board of secondary education, andhra pradesh',
'matriculation', 'gujarat state board', 'karnataka state board',
'wbbse', 'maharashtra state board', 'icse board', 'up board',
'board of secondary education(bse) orissa',
'little jacky matric higher secondary school',
'uttar pradesh board', 'bsc,orissa', 'mp board', 'upboard',
'matriculation board', 'j & k bord', 'rbse',
'central board of secondary education', 'pseb', 'jkbose',
'haryana board of school education,(hbse)', 'metric', 'ms board',
'kseeb', 'stateboard', 'maticulation',
'karnataka secondory education board', 'mumbai board', 'sslc',
'kseb', 'board secondary education', 'matric board',
'board of secondary education',
'west bengal board of secondary education',
'jharkhand secondary examination board,ranchi', 'u p board',
'bseb,patna', 'hsc', 'bse', 'sss pune',
'karnataka education board (keeb)', 'kerala',
'state board of secondary education( ssc)', 'gsheb',
'up(allahabad)', 'nagpur', 'don bosco maatriculation school',
'karnataka state secondary education board', 'maharashtra',
'karnataka secondary education board',
'himachal pradesh board of school education',
'certificate of middle years program of ib',
'karnataka board of secondary education',
'board of secondary education rajasthan', 'uttarakhand board',
'ua', 'board of secendary education orissa',
'karantaka secondary education and examination borad', 'hbsc',
'kseeb(karnataka secondary education examination board)',
'cbse[gulf zone]', 'hbse', 'state(karnataka board)',
'jharkhand accademic council',
'jharkhand secondary examination board (ranchi)',
'karnataka secondary education examination board', 'delhi board',
'mirza ahmed ali baig', 'jseb', 'bse, odisha', 'bihar board',
'maharashtra state(latur board)', 'rajasthan board', 'mpboard',
'upbhsie', 'secondary board of rajasthan',
'tamilnadu matriculation board', 'jharkhand secondary board',
'board of secondary education,andhara pradesh', 'up baord',
'state', 'board of intermediate education',
'state board of secondary education,andhra pradesh',
'up board , allahabad',
'stjosephs girls higher sec school,dindigul', 'maharashtra board',
'education board of kerala', 'board of ssc',
'maharashtra state board pune',
'board of school education harayana',
'secondary school cerfificate', 'maharashtra sate board', 'ksseb',
'bihar examination board, patna', 'latur',
'board of secondary education, rajasthan', 'state borad hp',
'cluny', 'bsepatna', 'up borad', 'ssc board of andrapradesh',
'matric', 'bse,orissa', 'ssc-andhra pradesh', 'mp',
'karnataka education board', 'mhsbse',
'karnataka sslc board bangalore', 'karnataka', 'u p',
'secondary school of education', 'state board of karnataka',
'karnataka secondary board', 'andhra pradesh board ssc',
'stjoseph of cluny matrhrsecschool,neyveli,cuddalore district',
'hse,orissa', 'national public school', 'nagpur board',
'jharkhand academic council', 'bsemp',
'board of secondary education, andhra pradesh',
'board of secondary education orissa',
'board of secondary education,rajasthan(rbse)',
'board of secondary education,ap',
'board of secondary education,andhra pradesh',
'jawahar navodaya vidyalaya', 'aisse',
'karnataka board of higher education', 'bihar',
'kerala state board', 'cicse', 'tn state board',
'kolhapur divisional board, maharashtra',
'bharathi matriculation school', 'uttaranchal state board',
'wbbsce', 'mp state board', 'seba(assam)', 'anglo indian', 'gseb',
'uttar pradesh', 'ghseb', 'board of school education uttarakhand',
'msbshse,pune', 'tamilnadu state board', 'kerala university',
'uttaranchal shiksha avam pariksha parishad',
'bse(board of secondary education)',
'bright way college, (up board)',
'school secondary education, andhra pradesh',
'secondary state certificate',
'maharashtra state board of secondary and higher secondary education,pune',
'andhra pradesh state board', 'stmary higher secondary', 'cgbse',
'secondary school certificate', 'rajasthan board ajmer', 'mpbse',
'pune board', 'cbse ', 'board of secondary education,orissa',
'maharashtra state board,pune', 'up bord',
'kiran english medium high school', 'state board (jac, ranchi)',
'gujarat board', 'state board ', 'sarada high scchool',
'kalaimagal matriculation higher secondary school',
'karnataka board', 'maharastra board', 'sslc board',
'ssc maharashtra board', 'tamil nadu state', 'uttrakhand board',
'bihar secondary education board,patna',
'haryana board of school education',
'sri kannika parameswari highier secondary school, udumalpet',
'ksseb(karnataka state board)', 'nashik board',
'jharkhand secondary education board', 'himachal pradesh board',
'maharashtra satate board',
'maharashtra state board mumbai divisional board',
'dav public school,hehal',
'state board of secondary education, ap',
'rajasthan board of secondary education', 'hsce',
'karnataka secondary education',
'board of secondary education,odisha', 'maharashtra nasik board',
'west bengal board of secondary examination (wbbse)',
'holy cross matriculation hr sec school', 'cbsc', 'apssc',
'bseb patna', 'kolhapur', 'bseb, patna', 'up board allahabad',
'biharboard', 'nagpur board,nagpur', 'pune', 'gyan bharati school',
'rbse,ajmer', 'board of secondaray education',
'secondary school education', 'state bord', 'jbse,jharkhand',
'hse', 'madhya pradesh board', 'bihar school examination board',
'west bengal board of secondary eucation', 'state boardmp board ',
'icse board , new delhi',
'board of secondary education (bse) orissa',
'maharashtra state board for ssc',
'board of secondary school education', 'latur board',
"stmary's convent inter college", 'nagpur divisional board',
'ap state board', 'cgbse raipur', 'uttranchal board', 'ksbe',
'central board of secondary education, new delhi',
'bihar school examination board patna', 'cbse board',
'sslc,karnataka', 'mp-bse', 'up bourd', 'dav public school sec 14',
'board of school education haryana',
'council for indian school certificate examination',
'aurangabad board', 'j&k state board of school education',
'maharashtra state board of secondary and higher secondary education',
'maharashtra state boar of secondary and higher secondary education',
'ssc regular', 'karnataka state examination board', 'nasik',
'west bengal board of secondary education', 'up board,allahabad',
'bseb ,patna',
'state board - west bengal board of secondary education : wbbse',
'maharashtra state board of secondary & higher secondary education',
'delhi public school', 'karnataka secondary eduction',
'secondary education board of rajasthan',
'maharashtra board, pune', 'rbse (state board)', 'apsche',
'board of secondary education',
'board of high school and intermediate education uttarpradesh',
'kea', 'board of secondary education - andhra pradesh',
'ap state board for secondary education', 'seba',
'punjab school education board, mohali',
'jharkhand acedemic council', 'hse,board',
'board of ssc education andhra pradesh', 'up-board', 'bse,odisha'],
dtype=object)
df['10board']=df['10board'].str.strip()
def change_10thboard_name(name):
if name not in ['cbse', 'icse board', 'cbse board', 'icse', 'cisce', 'cbse[gulf zone]']:
return 'State Board'
elif name in ['cbse','cbse[gulf zone]','cbse board']:
return 'CBSE'
elif name in ['icse', 'icse board','cisce']:
return 'ICSE'
df['10board']=df['10board'].apply(change_10thboard_name)
df['10board'].unique()
array(['State Board', 'CBSE', 'ICSE'], dtype=object)
plt.figure(figsize=(12,4))
sns.countplot(df['10board'])
plt.xticks(fontsize=12)
plt.title("Count of Employees by their 10th Board", fontsize=18, color='blue');
60% Employees studied in Respective State Boards.35-38% EMployees studied in 'CBSE Board.2-3% studied in ICSE Board.
df['12graduation'].dtype
dtype('int64')
df['12graduation'].nunique()
16
plt.figure(figsize=(12,4))
sns.countplot(df['12graduation'])
plt.xticks(fontsize=12)
plt.title("Count of Employees by their 12th Pass Year", fontsize=18, color='blue');
df['12percentage'].dtype
dtype('float64')
print("Minumum 12th Class Percentage :", df['12percentage'].min())
print("Maximum 12th Class Percentage :", df['12percentage'].max())
print("Average 12th Class Percentage: ", df['12percentage'].mean())
Minumum 12th Class Percentage : 40.0 Maximum 12th Class Percentage : 98.7 Average 12th Class Percentage: 74.46636568284127
plt.figure(figsize=(12,4))
sns.boxplot(df['12percentage'], color='pink')
plt.xticks(fontsize=12)
plt.title("Distribution of 12th Class Percentage of Employees", fontsize=18, color='blue');
plt.figure(figsize=(12,4))
sns.histplot(df['12percentage'], color='orange', kde=True)
plt.xticks(fontsize=12)
plt.title("Distribution of 12th Class Percentage of Employees", fontsize=18, color='blue');
Minumum 12th Class Percentage is 40%, Maximum 10th Class Percentage is 98.7% & Average 10th Class Percentage is 78.46%.12th Percetage between 60% to 90%.12th Percentage is less than 50%. and there is one employee whos percentage is 40%
df['12board'].dtype
dtype('O')
df['12board'].nunique()
340
df['12board'].unique()
array(['board of intermediate education,ap', 'cbse', 'state board',
'mp board', 'isc', 'icse', 'karnataka pre university board', 'up',
'p u board, karnataka', 'dept of pre-university education', 'bie',
'kerala state hse board', 'up board', 0, 'bseb', 'chse', 'puc',
' upboard',
'state board of intermediate education, andhra pradesh',
'karnataka state board',
'west bengal state council of technical education', 'wbchse',
'maharashtra state board', 'ssc', 'isc board',
'sda matric higher secondary school', 'uttar pradesh board', 'ibe',
'chsc', 'board of intermediate', 'isce', 'upboard', 'sbtet',
'hisher seconadry examination(state board)', 'pre university',
'borad of intermediate', 'j & k board',
'intermediate board of andhra pardesh', 'rbse',
'central board of secondary education', 'jkbose', 'hbse',
'board of intermediate education', 'state', 'ms board', 'pue',
'intermediate state board', 'stateboard', 'hsc',
'electonincs and communication(dote)', 'karnataka pu board',
'government polytechnic mumbai , mumbai board', 'pu board',
'baord of intermediate education', 'apbie', 'andhra board',
'tamilnadu stateboard',
'west bengal council of higher secondary education',
'cbse,new delhi', 'u p board', 'intermediate', 'biec,patna',
'diploma in engg (e &tc) tilak maharashtra vidayapeeth',
'hsc pune', 'pu board karnataka', 'kerala', 'gsheb',
'up(allahabad)', 'nagpur', 'st joseph hr sec school',
'pre university board', 'ipe', 'maharashtra', 'kea', 'apsb',
'himachal pradesh board of school education', 'staae board',
'international baccalaureate (ib) diploma', 'nios',
'karnataka board of university',
'board of secondary education rajasthan', 'uttarakhand board',
'ua', 'scte vt orissa', 'matriculation',
'department of pre-university education', 'wbscte',
'preuniversity board(karnataka)', 'jharkhand accademic council',
'bieap', 'msbte (diploma in computer technology)',
'jharkhand acamedic council (ranchi)',
'department of pre-university eduction', 'biec', 'all india board',
'sjrcw', ' board of intermediate', 'msbte',
'sri sankara vidyalaya', 'chse, odisha', 'bihar board',
'maharashtra state(latur board)', 'rajasthan board', 'mpboard',
'state board of technical eduction panchkula', 'upbhsie', 'apbsc',
'state board of technical education and training',
'secondary board of rajasthan',
'tamilnadu higher secondary education board',
'jharkhand academic council',
'board of intermediate education,hyderabad', 'up baord', 'pu',
'dte', 'board of secondary education', 'pre-university',
'board of intermediate education,andhra pradesh',
'up board , allahabad', 'srv girls higher sec school,rasipuram',
'intermediate board of education,andhra pradesh',
'intermediate board examination',
'department of pre-university education, bangalore',
'stmiras college for girls', 'mbose',
'department of pre-university education(government of karnataka)',
'dpue', 'msbte pune', 'board of school education harayana',
'sbte, jharkhand', 'bihar intermediate education council, patna',
'higher secondary', 's j polytechnic', 'latur',
'board of secondary education, rajasthan', 'jyoti nivas', 'pseb',
'biec-patna', 'board of intermediate education,andra pradesh',
'chse,orissa', 'pre-university board', 'mp', 'intermediate board',
'govt of karnataka department of pre-university education',
'karnataka education board',
'board of secondary school of education', 'pu board ,karnataka',
'karnataka secondary education board', 'karnataka sslc',
'board of intermediate ap', 'u p', 'state board of karnataka',
'directorate of technical education,banglore', 'matric board',
'andhpradesh board of intermediate education',
'stjoseph of cluny matrhrsecschool,neyveli,cuddalore district',
'bte up', 'scte and vt ,orissa', 'hbsc',
'jawahar higher secondary school', 'nagpur board', 'bsemp',
'board of intermediate education, andhra pradesh',
'board of higher secondary orissa',
'board of secondary education,rajasthan(rbse)',
'board of intermediate education:ap,hyderabad', 'science college',
'karnatak pu board', 'aissce', 'pre university board of karnataka',
'bihar', 'kerala state board', 'uo board', 'cicse',
'karnataka board', 'tn state board',
'kolhapur divisional board, maharashtra',
'jaycee matriculation school',
'board of higher secondary examination, kerala',
'uttaranchal state board', 'intermidiate', 'bciec,patna', 'bice',
'karnataka state', 'state broad', 'wbbhse', 'gseb',
'uttar pradesh', 'ghseb', 'board of school education uttarakhand',
'gseb/technical education board', 'msbshse,pune',
'tamilnadu state board', 'board of technical education',
'kerala university', 'uttaranchal shiksha avam pariksha parishad',
'chse(concil of higher secondary education)',
'bright way college, (up board)', 'board of intermidiate',
'higher secondary state certificate', 'karanataka secondary board',
'maharashtra board', 'andhra pradesh state board', 'cgbse',
'diploma in computers', 'bte,delhi', 'rajasthan board ajmer',
'mpbse', 'pune board', 'state board of technical education',
'gshseb', 'amravati divisional board',
'dote (diploma - computer engg)', 'up bord',
'karnataka pre-university board', 'jharkhand board',
'punjab state board of technical education & industrial training',
'department of technical education',
'sri chaitanya junior kalasala', 'state board (jac, ranchi)',
'gujarat board', 'aligarh muslim university',
'tamil nadu state board', 'hse', 'karnataka secondary education',
'state board ', 'karnataka pre unversity board',
'ks rangasamy institute of technology',
'karnataka board secondary education', 'narayana junior college',
'bteup', 'board of intermediate(bie)', 'hsc maharashtra board',
'tamil nadu state', 'uttrakhand board', 'psbte',
'stateboard/tamil nadu', 'intermediate council patna',
'technical board, punchkula', 'board of intermidiate examination',
'sri kannika parameswari highier secondary school, udumalpet',
'ap board', 'nashik board', 'himachal pradesh board',
'maharashtra satate board',
'andhra pradesh board of secondary education',
'tamil nadu polytechnic',
'maharashtra state board mumbai divisional board',
'department of pre university education',
'dav public school,hehal', 'board of intermediate education, ap',
'rajasthan board of secondary education',
'department of technical education, bangalore', 'chse,odisha',
'maharashtra nasik board',
'west bengal council of higher secondary examination (wbchse)',
'holy cross matriculation hr sec school', 'cbsc',
'pu board karnataka', 'biec patna', 'kolhapur', 'bseb, patna',
'up board allahabad', 'intermideate', 'nagpur board,nagpur',
'diploma(msbte)', 'dav public school',
'pre university board, karnataka', 'ssm srsecschool', 'state bord',
'jstb,jharkhand', 'intermediate board of education',
'mp board bhopal', 'pub', 'madhya pradesh board',
'bihar intermediate education council',
'west bengal council of higher secondary eucation',
'isc board , new delhi', 'mpc',
'certificate for higher secondary education (chse)orissa',
'maharashtra state board for hsc',
'board of intermeadiate education', 'latur board',
'andhra pradesh', 'karnataka pre-university',
'lucknow public college', 'nagpur divisional board',
'ap intermediate board', 'cgbse raipur', 'uttranchal board',
'jiec', 'central board of secondary education, new delhi',
'bihar school examination board patna',
'state board of technical education harayana', 'mp-bse',
'up bourd', 'dav public school sec 14',
'haryana state board of technical education chandigarh',
'council for indian school certificate examination',
'jaswant modern school', 'madhya pradesh open school',
'aurangabad board', 'j&k state board of school education',
'diploma ( maharashtra state board of technical education)',
'board of technicaleducation ,delhi',
'maharashtra state boar of secondary and higher secondary education',
'hslc (tamil nadu state board)',
'karnataka state examination board', 'puboard', 'nasik',
'west bengal board of higher secondary education',
'up board,allahabad', 'board of intrmediate education,ap', 'cbese',
'karnataka state pre- university board',
'state board - west bengal council of higher secondary education : wbchse',
'maharashtra state board of secondary & higher secondary education',
'biec, patna', 'state syllabus', 'cbse board', 'scte&vt',
'board of intermediate,ap',
'secnior secondary education board of rajasthan',
'maharashtra board, pune', 'rbse (state board)',
'board of intermidiate education,ap',
'board of high school and intermediate education uttarpradesh',
'higher secondary education',
'board fo intermediate education, ap', 'intermedite',
'ap board for intermediate education', 'ahsec',
'punjab state board of technical education & industrial training, chandigarh',
'state board - tamilnadu', 'jharkhand acedemic council',
'scte & vt (diploma)', 'karnataka pu',
'board of intmediate education ap', 'up-board',
'boardofintermediate'], dtype=object)
df['10board']=df['10board'].str.strip()
def change_12thboard_name(name):
if name not in ['cbse', 'icse board', 'cbse board', 'icse', 'cisce', 'cbse[gulf zone]']:
return 'State Board'
elif name in ['cbse','cbse[gulf zone]','cbse board']:
return 'CBSE'
elif name in ['icse', 'icse board','cisce']:
return 'ICSE'
df['12board']=df['12board'].apply(change_12thboard_name)
df['12board'].unique()
array(['State Board', 'CBSE', 'ICSE'], dtype=object)
plt.figure(figsize=(12,4))
sns.countplot(df['12board'])
plt.xticks(fontsize=12)
plt.title("Count of Employees by their 12th Board", fontsize=18, color='blue');
60% Employees studied in Respective State Boards.38% EMployees studied in 'CBSE Board.1-2% studied in ICSE Board.
df['CollegeTier'].dtype
dtype('int64')
df['CollegeTier'].unique()
array([2, 1], dtype=int64)
plt.figure(figsize=(12,4))
sns.countplot(df['CollegeTier'], palette='pink_r')
plt.xticks(fontsize=12)
plt.title("Count of Employees by their College Tier", fontsize=18, color='blue');
90% Employeess are from 2nd College Tier.
df['Degree'].dtype
dtype('O')
df['Degree'].unique()
array(['B.Tech/B.E.', 'MCA', 'M.Tech./M.E.', 'M.Sc. (Tech.)'],
dtype=object)
df['Degree'].value_counts()
B.Tech/B.E. 3700 MCA 243 M.Tech./M.E. 53 M.Sc. (Tech.) 2 Name: Degree, dtype: int64
plt.figure(figsize=(12,4))
sns.countplot(df['Degree'])
plt.xticks(fontsize=12)
plt.title("Count of Employees by Degree", fontsize=18, color='blue');
90% Employees are from Engineering discipline (B.E/B.Tech & ME/M.Tech)5-6% are from MCA2 employees are from MSc.
df['Specialization'].dtype
dtype('O')
df['Specialization'].unique()
array(['computer engineering',
'electronics and communication engineering',
'information technology', 'computer science & engineering',
'mechanical engineering', 'electronics and electrical engineering',
'electronics & telecommunications',
'instrumentation and control engineering', 'computer application',
'electronics and computer engineering', 'electrical engineering',
'applied electronics and instrumentation',
'electronics & instrumentation eng',
'information science engineering', 'civil engineering',
'mechanical and automation', 'industrial & production engineering',
'control and instrumentation engineering',
'metallurgical engineering',
'electronics and instrumentation engineering',
'electronics engineering', 'ceramic engineering',
'chemical engineering', 'aeronautical engineering', 'other',
'biotechnology', 'embedded systems technology',
'electrical and power engineering',
'computer science and technology', 'mechatronics',
'automobile/automotive engineering', 'polymer technology',
'mechanical & production engineering',
'power systems and automation', 'instrumentation engineering',
'telecommunication engineering',
'industrial & management engineering', 'industrial engineering',
'computer and communication engineering',
'information & communication technology', 'information science',
'internal combustion engine', 'computer networking',
'biomedical engineering', 'electronics', 'computer science'],
dtype=object)
def change_branch_names(branch):
if branch in ['computer science','computer science and technology', 'computer and communication engineering',
'computer science & engineering','computer engineering','computer networking']:
return "Computer Science Engineering"
elif branch in ['electronics','electronics engineering','electronics and communication engineering',
'electronics and computer engineering']:
return "Electronics & Communication Engineering"
elif branch in ['electronics & instrumentation eng','electronics & instrumentation eng','instrumentation engineering',
'control and instrumentation engineering','electronics and instrumentation engineering',
'instrumentation and control engineering','applied electronics and instrumentation']:
return "Electronics & Instrumentation Engineering"
elif branch in ['electronics & telecommunications','telecommunication engineering',]:
return "Electronics & Telecommunication Engineering"
elif branch in ['information & communication technology', 'information science','information technology',
'information science engineering']:
return "Information Technology/Science"
elif branch in ['electronics and electrical engineering','electrical and power engineering']:
return "Electrical Engineering"
elif branch in ['industrial & management engineering', 'industrial engineering']:
return "Industrial Engineering"
elif branch in ['internal combustion engine','mechanical & production engineering','mechanical and automation','mechatronics']:
return "Mechanical Engineering"
else:
return branch
df['Specialization']=df['Specialization'].apply(change_branch_names)
#Making Specialization in title format
df['Specialization']=df['Specialization'].str.title()
df['Specialization'].unique()
array(['Computer Science Engineering',
'Electronics & Communication Engineering',
'Information Technology/Science', 'Mechanical Engineering',
'Electrical Engineering',
'Electronics & Telecommunication Engineering',
'Electronics & Instrumentation Engineering',
'Computer Application', 'Civil Engineering',
'Industrial & Production Engineering', 'Metallurgical Engineering',
'Ceramic Engineering', 'Chemical Engineering',
'Aeronautical Engineering', 'Other', 'Biotechnology',
'Embedded Systems Technology', 'Automobile/Automotive Engineering',
'Polymer Technology', 'Power Systems And Automation',
'Industrial Engineering', 'Biomedical Engineering'], dtype=object)
plt.figure(figsize=(12,8))
sns.countplot(y=df['Specialization'])
plt.xticks(fontsize=12)
plt.title("Count of Employees by Specialization/Branch", fontsize=18, color='blue');
Computer Science, Electronics & Communication, Information Technology, Mechanical, Electrical, Electronics & Telecommunication, Electronics & Instrumentation and Computer Application.CS,EC,IT/IS are more likely to employed in Software Field.
df['collegeGPA'].dtype
dtype('float64')
plt.figure(figsize=(12,4))
sns.histplot(df['collegeGPA'], color='orange', kde=True)
plt.xticks(fontsize=12)
plt.title("Distribution of College GPA", fontsize=18, color='blue');
def gpa_conversion(gpa):
if gpa<10:
return gpa*10
else:
return gpa
df['collegeGPA']=df['collegeGPA'].apply(gpa_conversion)
print("Highest GPA:",df['collegeGPA'].max())
print("Lowest GPA:",df['collegeGPA'].min())
print("Average GPA:",df['collegeGPA'].mean())
Highest GPA: 99.93 Lowest GPA: 49.07 Average GPA: 71.69572786393196
plt.figure(figsize=(12,4))
sns.boxplot(df['collegeGPA'], color='pink')
plt.xticks(fontsize=12)
plt.title("Distribution of College GPA", fontsize=18, color='blue');
plt.figure(figsize=(12,4))
sns.histplot(df['collegeGPA'], color='orange', kde=True)
plt.xticks(fontsize=12)
plt.title("Distribution of College GPA", fontsize=18, color='blue');
49%, Maximum is 99.93% and Average GPA is 71.70%.60%-90%.No extreame Outliers in the GPA column.
df['CollegeCityTier'].dtype
dtype('int64')
df['CollegeCityTier'].unique()
array([0, 1], dtype=int64)
plt.figure(figsize=(12,4))
sns.countplot(y= df['CollegeCityTier'], palette='pink_r')
plt.xticks(fontsize=12)
plt.title("College City Tier", fontsize=18, color='blue');
Most of the Colleges Located in the 0th Tier of the city.
df['CollegeState'].dtype
dtype('O')
df['CollegeState'].unique()
array(['Andhra Pradesh', 'Madhya Pradesh', 'Uttar Pradesh', 'Delhi',
'Karnataka', 'Tamil Nadu', 'West Bengal', 'Maharashtra', 'Haryana',
'Telangana', 'Orissa', 'Punjab', 'Kerala', 'Gujarat', 'Rajasthan',
'Chhattisgarh', 'Uttarakhand', 'Jammu and Kashmir', 'Jharkhand',
'Himachal Pradesh', 'Bihar', 'Assam', 'Goa', 'Sikkim',
'Union Territory', 'Meghalaya'], dtype=object)
plt.figure(figsize=(12,4))
sns.countplot(df['CollegeState'])
plt.xticks(fontsize=12, rotation=70)
plt.title("Count of the Employees by their College State", fontsize=18, color='blue');
Uttar Pradesh College's.Uttar Pradesh, Karnataka, Tamil Nadu, Telangana, Maharastra.Assam, Goa, Sikkim, Meghalaya and Unioin Teretories.
df['GraduationYear'].dtype
dtype('int64')
df['GraduationYear'].unique()
array([2011, 2012, 2014, 2016, 2013, 2010, 2015, 2009, 2017, 0, 2007],
dtype=int64)
df[df['GraduationYear']==0]
| ID | Salary | DOJ | DOL | Designation | JobCity | Gender | DOB | 10percentage | 10board | 12graduation | 12percentage | 12board | CollegeID | CollegeTier | Degree | Specialization | collegeGPA | CollegeCityID | CollegeCityTier | CollegeState | GraduationYear | English | Logical | Quant | Domain | ComputerProgramming | ElectronicsAndSemicon | ComputerScience | MechanicalEngg | ElectricalEngg | TelecomEngg | CivilEngg | conscientiousness | agreeableness | extraversion | nueroticism | openess_to_experience | Year_of_Joining | Year_of_Leaving | Age | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 2664 | 794381 | 325000 | 2014-07-01 | 2015-07-01 | System Engineer | Chandigarh | m | 1991-12-17 | 89.0 | CBSE | 2010 | 85.2 | CBSE | 8346 | 2 | B.Tech/B.E. | Mechanical Engineering | 68.76 | 8346 | 0 | Uttar Pradesh | 0 | 695 | 655 | 680 | 0.755375 | -1 | -1 | -1 | 446 | -1 | -1 | -1 | 1.1336 | 0.3789 | -0.2974 | -0.3612 | -1.244 | 2014 | 2015 | 24 |
df['GraduationYear'].replace({0:2014}, inplace=True)
df['GraduationYear'].unique()
array([2011, 2012, 2014, 2016, 2013, 2010, 2015, 2009, 2017, 2007],
dtype=int64)
plt.figure(figsize=(12,4))
sns.countplot(df['GraduationYear'])
plt.xticks(fontsize=12)
plt.title("Count of the Employees by their Bachelors Degree Graduation Year", fontsize=18, color='blue');
Highest Number of Employees are completed their Bachelors Degree in year 2013 and least in 2007.
df[['English','Logical','Quant','Domain']].dtypes
English int64 Logical int64 Quant int64 Domain float64 dtype: object
fig,ax=plt.subplots(2,2, figsize=(15,6))
sns.boxplot(df['English'], color='pink', ax=ax[0][0]).set(title='Distribution Scores in English')
sns.boxplot(df['Logical'], color='orange', ax=ax[0][1]).set(title='Distribution of Scores in Logical')
sns.boxplot(df['Quant'], color='blue', ax=ax[1][0]).set(title='Distribution of Scores in Quantitative')
sns.boxplot(df['Domain'], color='green', ax=ax[1][1]).set(title='Distribution of Scores in Domain');
fig,ax=plt.subplots(2,2, figsize=(15,6))
sns.histplot(df['English'],kde=True, color='pink', ax=ax[0][0]).set(title='Distribution Scores in English')
sns.histplot(df['Logical'],kde=True, color='orange', ax=ax[0][1]).set(title='Distribution of Scores in Logical')
sns.histplot(df['Quant'],kde=True, color='blue', ax=ax[1][0]).set(title='Distribution of Scores in Quantitative')
sns.histplot(df['Domain'],kde=True, color='green', ax=ax[1][1]).set(title='Distribution of Scores in Domain');
scores=-1, this means either they did not attended the Domain Exam or they do not have any specific doamin.print("Minimum Score in English={}, Logical={}, Quantitative={}, Domain={}".format(df['English'].min(),df['Logical'].min(), df['Quant'].min(), df['Domain'].min()))
print("Maximum Score in English={}, Logical={}, Quantitative={}, Domain={}".format(df['English'].max(),df['Logical'].max(), df['Quant'].max(), df['Domain'].max()))
print("Average Score in English={}, Logical={}, Quantitative={}, Domain={}".format(df['English'].mean(),df['Logical'].mean(), df['Quant'].mean(), df['Domain'].mean()))
Minimum Score in English=180, Logical=195, Quantitative=120, Domain=-1.0 Maximum Score in English=875, Logical=795, Quantitative=900, Domain=0.999910407631314 Average Score in English=501.64907453726863, Logical=501.59879939969983, Quantitative=513.3781890945472, Domain=0.5104896530075451
500 to 510.¶
df[['ComputerProgramming','ElectronicsAndSemicon','ComputerScience','MechanicalEngg','ElectricalEngg','TelecomEngg','CivilEngg']].dtypes
ComputerProgramming int64 ElectronicsAndSemicon int64 ComputerScience int64 MechanicalEngg int64 ElectricalEngg int64 TelecomEngg int64 CivilEngg int64 dtype: object
scores=-1, lets see the distribution for only the score>=0¶fig,ax=plt.subplots(4,2, figsize=(18,20))
sns.boxplot(df[df['ComputerProgramming']>=0].ComputerProgramming, color='pink', ax=ax[0][0]).set(title='Distribution Scores in Computer Programming')
sns.boxplot(df[df['ElectronicsAndSemicon']>=0].ElectronicsAndSemicon, color='orange', ax=ax[0][1]).set(title='Distribution of Scores in Electronics And Semicon')
sns.boxplot(df[df['ComputerScience']>=0].ComputerScience, color='blue', ax=ax[1][0]).set(title='Distribution of Scores in Computer Science')
sns.boxplot(df[df['MechanicalEngg']>=0].MechanicalEngg, color='green', ax=ax[1][1]).set(title='Distribution of Scores in Mechanical Engineering')
sns.boxplot(df[df['ElectricalEngg']>=0].ElectricalEngg, color='yellow', ax=ax[2][0]).set(title='Distribution of Scores in Electrical Engineering')
sns.boxplot(df[df['TelecomEngg']>=0].TelecomEngg, color='lightblue', ax=ax[2][1]).set(title='Distribution of Scores in Telecom Engineering')
sns.boxplot(df[df['CivilEngg']>=0].CivilEngg, color='gray', ax=ax[3][0]).set(title='Distribution of Scores in Civil Engineering');
fig,ax=plt.subplots(4,2, figsize=(18,20))
sns.histplot(df[df['ComputerProgramming']>=0].ComputerProgramming, kde=True, color='pink', ax=ax[0][0]).set(title='Distribution Scores in Computer Programming')
sns.histplot(df[df['ElectronicsAndSemicon']>=0].ElectronicsAndSemicon, kde=True, color='orange', ax=ax[0][1]).set(title='Distribution of Scores in Electronics And Semicon')
sns.histplot(df[df['ComputerScience']>=0].ComputerScience, kde=True, color='blue', ax=ax[1][0]).set(title='Distribution of Scores in Computer Science')
sns.histplot(df[df['MechanicalEngg']>=0].MechanicalEngg, kde=True, color='green', ax=ax[1][1]).set(title='Distribution of Scores in Mechanical Engineering')
sns.histplot(df[df['ElectricalEngg']>=0].ElectricalEngg, kde=True, color='yellow', ax=ax[2][0]).set(title='Distribution of Scores in Electrical Engineering')
sns.histplot(df[df['TelecomEngg']>=0].TelecomEngg, kde=True, color='lightblue', ax=ax[2][1]).set(title='Distribution of Scores in Telecom Engineering')
sns.histplot(df[df['CivilEngg']>=0].CivilEngg, kde=True, color='gray', ax=ax[3][0]).set(title='Distribution of Scores in Civil Engineering');
Computer Programming Exam Job Swwkers have score above 800 Marks, but rest of the Exams the Maximum score is very lessCompetetion in the field of Computer Programming.for col in ['ComputerProgramming','ElectronicsAndSemicon','ComputerScience','MechanicalEngg','ElectricalEngg','TelecomEngg','CivilEngg']:
print(f'Subject: {col}')
print(f'Minimum Score:{(df[df[col]>=0][col]).min()}', end=" ")
print(f'Maximum Score:{(df[df[col]>=0][col]).max()}', end=" ")
print(f'Average Score:{(df[df[col]>=0][col]).mean()}')
print("--"*10)
Subject: ComputerProgramming Minimum Score:105 Maximum Score:840 Average Score:451.3012779552716 -------------------- Subject: ElectronicsAndSemicon Minimum Score:133 Maximum Score:612 Average Score:335.6442307692308 -------------------- Subject: ComputerScience Minimum Score:130 Maximum Score:715 Average Score:405.6363636363636 -------------------- Subject: MechanicalEngg Minimum Score:180 Maximum Score:623 Average Score:406.87659574468086 -------------------- Subject: ElectricalEngg Minimum Score:206 Maximum Score:676 Average Score:433.0372670807453 -------------------- Subject: TelecomEngg Minimum Score:153 Maximum Score:548 Average Score:350.17379679144386 -------------------- Subject: CivilEngg Minimum Score:166 Maximum Score:516 Average Score:349.6666666666667 --------------------
experince=(df['DOL']-df['DOJ']).dt.components.iloc[:,0]/365
print("Maximum Years Worked in Company:", experince.max())
print("Minimum Years Worked in Company:", experince.min())
Maximum Years Worked in Company: 24.602739726027398 Minimum Years Worked in Company: -0.8301369863013699
x=df[(df['DOL']-df['DOJ']).dt.components.iloc[:,0]<0]
x.shape[0]
40
38 rows have DOJ greater than the DOL.
DOL Must be greater than the DOJ. But here inverse happened. Lets Impute with the "Present Date" which we were created earlier("01-01-2016").
df.iloc[x.index,3]=dt.datetime(2016,1,1)
df['YearsOfExperience']=(df['DOL']-df['DOJ']).dt.components.iloc[:,0]/365
print("Maximum Years Worked in Company:", df['YearsOfExperience'].max())
print("Minimum Years Worked in Company:", df['YearsOfExperience'].min())
print("Average Years Worked in Company:", df['YearsOfExperience'].mean())
Maximum Years Worked in Company: 24.602739726027398 Minimum Years Worked in Company: 0.0 Average Years Worked in Company: 1.7463505725465414
plt.figure(figsize=(12,4))
sns.histplot(df['YearsOfExperience'], kde=True, color='orange')
plt.xticks(fontsize=12)
plt.title("Distribution of Years an Employee worked in a company", fontsize=18, color='blue');
plt.figure(figsize=(12,4))
sns.boxplot(df['YearsOfExperience'], color='pink')
plt.xticks(fontsize=12)
plt.title("Distribution of Years an Employee worked in a company", fontsize=18, color='blue');
df[df['YearsOfExperience']>5].shape[0]
59
24 Years in a company.0 Years in a company.1 year and 8 Months.95% of employees has worked between 0-5 Years in a company.59 Employees has 5+ Years Experience.
plt.figure(figsize=(12,4))
sns.scatterplot(x=df['YearsOfExperience'], y=df['Salary'], color='orange')
plt.xticks(fontsize=12)
plt.title("Relationship between Years of Experience and Salary", fontsize=18, color='blue');
rel_sal_exp=df.copy()
Removing Salaries above 2.5 Lakks as there are only 6 Datapoints and Years of Experience above 7.
new_sal=rel_sal_exp[rel_sal_exp['Salary']<2000000]
new_sal_exp=new_sal[new_sal['YearsOfExperience']<=7]
plt.figure(figsize=(12,4))
sns.regplot(x=new_sal_exp['YearsOfExperience'], y=new_sal_exp['Salary'],marker='*',
color='orange', line_kws=dict(color='black'))
plt.xticks(fontsize=12)
plt.title("Relationship between Years of Experience and Salary", fontsize=18, color='blue');
There is a Positive Relationship Between Years of Experience & Salary, As Experience increases Salary increases.
plt.figure(figsize=(12,4))
sns.barplot(x=df['Designation'], y=df['Salary'], ci=None,
order=df.groupby('Designation').Salary.mean().sort_values(ascending=False).index[:15])
plt.xticks(fontsize=12, rotation=70)
plt.title("Top 15 high Paying Job Designation", fontsize=18, color='blue');
Junior Manager, Senior Developer and Data Scientistthese are the Job Designation with high average Salary.
plt.figure(figsize=(12,4))
sns.barplot(x=df['Designation'], y=df['Salary'], ci=None,
order=df.groupby('Designation').Salary.mean().sort_values(ascending=True).index[:15])
plt.xticks(fontsize=12, rotation=70)
plt.title("Low Paying Job Designation", fontsize=18, color='blue');
Secreary, Trainee Software Developer, Visiting Faculty are have low average salary.
plt.figure(figsize=(12,4))
sns.barplot(x=df['Gender'], y=df['Salary'], ci=None, palette='pink_r')
plt.xticks(fontsize=15)
plt.title("Average Salary by Gender", fontsize=18, color='blue');
Average Salary for Male and Female Employee is almost Equal.
designation_by_gender=pd.crosstab(index=df['Designation'], columns=df['Gender']).reset_index()
designation_by_gender.columns=['Designation','Female','Male']
designation_by_gender.head()
| Designation | Female | Male | |
|---|---|---|---|
| 0 | .Net Developer | 9 | 26 |
| 1 | .Net Web Developer | 1 | 3 |
| 2 | Account Executive | 2 | 2 |
| 3 | Account Manager | 0 | 1 |
| 4 | Admin Assistant | 0 | 2 |
plt.figure(figsize=(12,4))
sns.barplot(x='Designation', y='Female', data=designation_by_gender, ci=None,
order=designation_by_gender.groupby('Designation').Female.mean().sort_values(ascending=False).index[:15])
plt.xticks(fontsize=12, rotation=70)
plt.title("Top 15 Job Designation for Female", fontsize=18, color='blue');
Most Females working as Software Engineer, System Engineer, Software Developer, Programmer Analyst.
plt.figure(figsize=(12,4))
sns.barplot(x='Designation', y='Male', data=designation_by_gender, ci=None,
order=designation_by_gender.groupby('Designation').Male.mean().sort_values(ascending=False).index[:15])
plt.xticks(fontsize=12, rotation=70)
plt.title("Top 15 Job Designation for Male", fontsize=18, color='blue');
Software Engineer, System Engineer, Software Developer, Programmer Analyst.
new_sal=df.copy()
new_sal=new_sal[new_sal['Salary']<1500000]
plt.figure(figsize=(12,4))
sns.regplot(y='Salary', x='collegeGPA', data=new_sal, marker='*',
color='orange', line_kws=dict(color='black'))
plt.xticks(fontsize=12)
plt.title("Relationship between College GPA and Salary", fontsize=18, color='blue');
There is a Positive relation between College GPA and Salary, but the relation is not stronger.
df['Total_Score']=df['English']+df['Logical']+df['Quant']+df['Domain']
new_sal=df.copy()
new_sal=new_sal[new_sal['Salary']<1500000]
plt.figure(figsize=(12,4))
sns.regplot(y='Salary', x='Total_Score', data=df, marker='*',
color='orange', line_kws=dict(color='black'))
plt.xticks(fontsize=12)
plt.title("Relationship between Employability Score and Salary", fontsize=18, color='blue');
Higher is the Test Score higher the chances of getting high paying jobs.
edu_perf=pd.pivot_table(data=df, index='Gender', values=['10percentage','12percentage','collegeGPA'], aggfunc='mean').T
edu_perf.reset_index(inplace=True)
edu_perf.columns=['Percentage','Female','Male']
edu_perf
| Percentage | Female | Male | |
|---|---|---|---|
| 0 | 10percentage | 80.932894 | 76.979000 |
| 1 | 12percentage | 77.007618 | 73.666636 |
| 2 | collegeGPA | 74.135517 | 70.927928 |
fig=px.bar(data_frame=edu_perf, y='Percentage',x=['Female','Male'], text_auto=True,
title='Education Performance Comparison by Gender')
fig.show()
In 10th, 12th and even in Graduation the performance of Female student is better than Male😮😮.
plt.figure(figsize=(12,4))
sns.barplot(x='Specialization', y='collegeGPA', data=df, ci=None,
order=df.groupby('Specialization').collegeGPA.mean().sort_values(ascending=False).index[:15])
plt.xticks(fontsize=12, rotation=70)
plt.title("Top 15 Specialization by College GPA", fontsize=18, color='blue');
Embeded Systems Technology has highest average COllege GPA.
plt.figure(figsize=(12,4))
sns.barplot(x='Specialization', y='Salary', data=df, ci=None,
order=df.groupby('Specialization').Salary.mean().sort_values(ascending=False).index[:15])
plt.xticks(fontsize=12, rotation=70)
plt.title("Top 15 Specialization by Average Salary", fontsize=18, color='blue');
df[df['Specialization']=='Polymer Technology']
| ID | Salary | DOJ | DOL | Designation | JobCity | Gender | DOB | 10percentage | 10board | 12graduation | 12percentage | 12board | CollegeID | CollegeTier | Degree | Specialization | collegeGPA | CollegeCityID | CollegeCityTier | CollegeState | GraduationYear | English | Logical | Quant | Domain | ComputerProgramming | ElectronicsAndSemicon | ComputerScience | MechanicalEngg | ElectricalEngg | TelecomEngg | CivilEngg | conscientiousness | agreeableness | extraversion | nueroticism | openess_to_experience | Year_of_Joining | Year_of_Leaving | Age | YearsOfExperience | Total_Score | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 990 | 962376 | 700000 | 2014-07-01 | 2016-01-01 | Product Manager | Jaipur | m | 1992-01-03 | 87.4 | CBSE | 2009 | 74.0 | State Board | 436 | 1 | M.Tech./M.E. | Polymer Technology | 72.79 | 436 | 0 | Uttarakhand | 2014 | 625 | 450 | 655 | -1.0 | -1 | -1 | -1 | -1 | -1 | -1 | -1 | -0.3027 | -0.4536 | -0.6048 | -0.8682 | 0.0973 | 2014 | 2016 | 24 | 1.50411 | 1729.0 |
Polymer Technology has Highest salary. Becuase there is only Entry for this Specialization and the salary fot that employee is 7lakh.
plt.figure(figsize=(12,4))
sns.barplot(x='CollegeState', y='collegeGPA', data=df, ci=None)
plt.xticks(fontsize=12, rotation=70)
plt.title("College GPA by State", fontsize=18, color='blue');
“After doing your Computer Science Engineering if you take up jobs as a Programming Analyst, Software Engineer, Hardware Engineer and Associate Engineer you can earn up to 2.5-3 lakhs as a fresh
graduate.” Test this claim with the data given to you.Create DataFrame of Employee who Graduated in Computer Science Engineering descipline/Specialization.
CS=df[df['Specialization']=='Computer Science Engineering']
CS.head(2)
| ID | Salary | DOJ | DOL | Designation | JobCity | Gender | DOB | 10percentage | 10board | 12graduation | 12percentage | 12board | CollegeID | CollegeTier | Degree | Specialization | collegeGPA | CollegeCityID | CollegeCityTier | CollegeState | GraduationYear | English | Logical | Quant | Domain | ComputerProgramming | ElectronicsAndSemicon | ComputerScience | MechanicalEngg | ElectricalEngg | TelecomEngg | CivilEngg | conscientiousness | agreeableness | extraversion | nueroticism | openess_to_experience | Year_of_Joining | Year_of_Leaving | Age | YearsOfExperience | Total_Score | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | 203097 | 420000 | 2012-06-01 | 2016-01-01 | Senior Quality Engineer | Bangalore | f | 1990-02-19 | 84.3 | State Board | 2007 | 95.8 | State Board | 1141 | 2 | B.Tech/B.E. | Computer Science Engineering | 78.00 | 1141 | 0 | Andhra Pradesh | 2011 | 515 | 585 | 525 | 0.635979 | 445 | -1 | -1 | -1 | -1 | -1 | -1 | 0.9737 | 0.8128 | 0.5269 | 1.3549 | -0.4455 | 2012 | 2016 | 25 | 3.586301 | 1625.635979 |
| 3 | 267447 | 1100000 | 2011-07-01 | 2016-01-01 | Senior Software Engineer | Gurgaon | m | 1989-12-05 | 85.6 | CBSE | 2007 | 83.6 | CBSE | 6920 | 1 | B.Tech/B.E. | Computer Science Engineering | 74.64 | 6920 | 1 | Delhi | 2011 | 635 | 585 | 625 | 0.974396 | 615 | -1 | -1 | -1 | -1 | -1 | -1 | 0.0464 | 0.3448 | -0.3440 | -0.4078 | -0.9194 | 2011 | 2016 | 26 | 4.506849 | 1845.974396 |
CS.shape[0]
1354
There 1354 Employees Studies in Computer Science Engineering
Create DataFrame of Computer Science Engineering graduates working in Programming Analyst, Software Engineer, Hardware Engineer and Associate Engineer these job Roles/Designations.
req_des=CS[(CS['Designation']=='Program Analyst Trainee') |
(CS['Designation']=='Programmer Analyst Trainee') |
(CS['Designation']=='Software Engineer') |
(CS['Designation']=='Hardware Engineer and Associate Engineer') |
(CS['Designation']=='Associate Software Engineer') |
(CS['Designation']=='Software Programmer') |
(CS['Designation']=='Software Engineer Trainee') |
(CS['Designation']=='Software Trainee') |
(CS['Designation']=='Associate Software Engineer')]
req_des.head(2)
| ID | Salary | DOJ | DOL | Designation | JobCity | Gender | DOB | 10percentage | 10board | 12graduation | 12percentage | 12board | CollegeID | CollegeTier | Degree | Specialization | collegeGPA | CollegeCityID | CollegeCityTier | CollegeState | GraduationYear | English | Logical | Quant | Domain | ComputerProgramming | ElectronicsAndSemicon | ComputerScience | MechanicalEngg | ElectricalEngg | TelecomEngg | CivilEngg | conscientiousness | agreeableness | extraversion | nueroticism | openess_to_experience | Year_of_Joining | Year_of_Leaving | Age | YearsOfExperience | Total_Score | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 19 | 466888 | 325000 | 2014-09-01 | 2016-01-01 | Software Engineer | Pune | f | 1990-11-30 | 79.0 | CBSE | 2008 | 62.20 | CBSE | 7977 | 2 | B.Tech/B.E. | Computer Science Engineering | 76.30 | 7977 | 0 | Uttar Pradesh | 2012 | 485 | 445 | 435 | 0.864685 | 525 | -1 | -1 | -1 | -1 | -1 | -1 | 0.8192 | 0.2668 | -0.2714 | -0.4078 | -0.1295 | 2014 | 2016 | 25 | 1.334247 | 1365.864685 |
| 20 | 140069 | 320000 | 2010-11-01 | 2012-09-01 | Software Engineer | Bangalore | f | 1988-07-25 | 91.2 | State Board | 2006 | 84.63 | State Board | 4437 | 2 | B.Tech/B.E. | Computer Science Engineering | 78.15 | 4437 | 0 | Karnataka | 2010 | 455 | 535 | 445 | 0.635979 | 445 | -1 | -1 | -1 | -1 | -1 | -1 | -0.1082 | 0.9688 | 0.5269 | -0.2902 | 0.3444 | 2010 | 2012 | 27 | 1.835616 | 1435.635979 |
req_des.shape[0]
282
283 Employees from Computer Science Background working as related to Programming Analyst or Software Engineer or Hardware Engineer and Associate Engineer.
Check whether the Average Salary for these Job Designation Lies Between 2.5 to 3 Lakh or Not.
plt.figure(figsize=(12,4))
sns.histplot(x=req_des['Salary'], kde=True, color='orange')
plt.xticks(fontsize=12)
plt.title("Salary Distribution for Perticular Designation", fontsize=18, color='blue');
plt.figure(figsize=(12,4))
sns.boxplot(x=req_des['Salary'], color='orange')
plt.xticks(fontsize=12)
plt.title("Salary Distribution for Perticular Designation", fontsize=18, color='blue');
req_des['Salary'].median()
325000.0
gender_by_specialization=pd.crosstab(index=df['Gender'], columns=df['Specialization']).T
gender_by_specialization.reset_index(inplace=True)
gender_by_specialization.columns=['Specialization', 'Female','Male']
gender_by_specialization
| Specialization | Female | Male | |
|---|---|---|---|
| 0 | Aeronautical Engineering | 1 | 2 |
| 1 | Automobile/Automotive Engineering | 0 | 5 |
| 2 | Biomedical Engineering | 2 | 0 |
| 3 | Biotechnology | 9 | 6 |
| 4 | Ceramic Engineering | 0 | 1 |
| 5 | Chemical Engineering | 1 | 8 |
| 6 | Civil Engineering | 6 | 23 |
| 7 | Computer Application | 59 | 185 |
| 8 | Computer Science Engineering | 361 | 993 |
| 9 | Electrical Engineering | 51 | 229 |
| 10 | Electronics & Communication Engineering | 215 | 688 |
| 11 | Electronics & Instrumentation Engineering | 26 | 67 |
| 12 | Electronics & Telecommunication Engineering | 29 | 98 |
| 13 | Embedded Systems Technology | 0 | 1 |
| 14 | Industrial & Production Engineering | 2 | 8 |
| 15 | Industrial Engineering | 1 | 2 |
| 16 | Information Technology/Science | 183 | 507 |
| 17 | Mechanical Engineering | 11 | 201 |
| 18 | Metallurgical Engineering | 0 | 2 |
| 19 | Other | 0 | 13 |
| 20 | Polymer Technology | 0 | 1 |
| 21 | Power Systems And Automation | 0 | 1 |
fig=px.pie(names='Specialization', values='Female', data_frame=gender_by_specialization,
title='Most common Specialization in Female')
fig.show()
fig=px.pie(names='Specialization', values='Male', data_frame=gender_by_specialization,
title='Most common Specialization in Male')
fig.show()